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FORKWORD 

This docament describes a path tor koy 'element of a loncj range 
research effort fqr improv*inq the setecti^on, cl ass i f i ca t ivon , and 
utflizatlon of Army enlisted pefsonnel , " The thrust fo-r this 
effort came from the practical, professional, arid leqal need to 

' ^ ' ■ A- 

\ralidate the Arnted Services Vocational Aptitude Battery (asvaB) 
and other selection variables as predicfots Qf training and 
on--t;he-job performance. The portion of th\s effort described - 
herein, is devoted to the developmen^t of a longitudinal research 
data base (LRDB) to support the research being undertaken by 'ari 
InHhouse and through two major contracts: the first bei,ng. 

* 

devoted to the development "and validation of Army sele'ction awd 
cia^ssif Ication procecJures (Proj«ct A), and the second (Project/ B) 
deyoted to the development of a- Prototype Computerized Personnel 
Allocation System (EPAS).. Together these Army Research Institute 
e£(,forts, with their in-house and contract coipponen ts , comprise a' 
landmark program to develop a state-of-the-art, Gmpiricallyy 
validated personnel selection, classification, and allocat/bn 
5yst^. The. work towards the development of' th^ LRDB is funded 

I 

primarily by .Army Project Number 2Q2(^3731A792 . 




EDGAR. JOHNSON 
Technical Director 
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l.X, Nature and Purpose of the^Overall Project 

The Army Research Ins.titute (ARI) is currently funding 
two lar^e-scale re-search pr^jfects in order to'develop a new 
selection and cl as^^l^ication system . that 'wi 1 1 improve the 
eff^,ciency o,f personnel ubi i i za t i on in the Army, ^ The " " 
purposes of the fir-et prbject, Project'^A: Development of 
I'mproved Army Selection and Classification Systems, are: 

\ 

. (1) to validate current predictors (primarily the 

ASVABl^.composi tes , supplemented by other available 
predictor's such as high school graduation,. 
Mflitary Applicant Profile (MAP) data, and 
physical capacities); 



(2) to develop new or improved predictors and 
performance measures; and 



(3)1 , to conduct a longitudinal validation of current 



1. 



and newly developed classification measures for 
prediction /of the enlistee's performa^nce from 
training through the second tour of duty. 
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The second pro j ec t , Pro j ec t B : DevelopmetU of a 
Enlisted Personnel AHoc=ation Sys tern ( EPAS ) , is to develop 
a. 5St«te-*o)E-' the-ar t system, for implementation at the 
Wi 1 itary, Entrance Processing Station (MEPS), to facilitate 
the initial enl-istmen^decisions. The success of the EPAS 
depends on a sHet of cost-effective classification measures 
that can accutafcely predict the recruit's tuture 
performance throughout hf's/her Army car.eer.' The majof 
objective of Project A is" to ensure that sudh a set of 
predictors Is available to drive the new allocation system. 
Thus, although the twp projects are being conducted 
separately , they have a cpmmon goal to improve- the Army's^^ 
personnel decision mechanism and therej^y increase the ^^^-^ 
overall effectiveness of the Army. 

The selection of a ^"^^^of cost-effective 
classification measures for use in the EPAS requires first 
a careful evaluation of the relationships of the current 
predictors to performance on Army jobs* The Army presently 

Ml. 

has a systematic testing and validation progra(n to support 
its current selection practice. Specifically, the Armed 
Force»s Vocational Aptitude Battery (ASVAB) is administered 
to all applicant's, and aptitude area composites are 
developed for use in the initial selection and assignment 
to training courses. The ASVAB composites are 
traditionally >al idated in te^rms of the extent to which 



( i 



\ 



w 



j 

they predict the enlistee's success in traini>ig 
(essentially measured by course grades) . Although these 
composites generally are quite effective in' predicting ho 
well the enlistees will perform during training, their 
validities for predicting other important areas of Army 
performances — general soldiering and job-specific 
performances -- have not' been extensively investigated 
tjecause" valid, soxind, and economical measures qf these 
additional aspects of performance are not currently 
avail able< * ' 



In addition to valid predictions of performance, the 

new EPAS will require information on the ' rel,atl ve utility ' 

to the Army of different levels of performance in different 

MOS. The collection and analysis of, such data is another 

* ■■ 
major ob^jective of Project A. 

^ ' ^-^ • • . '} 

While the. greatest coicern is With initial selection 

and -ciassi titration decisions. Project A will also address 

• — ^ .'^ 

subsequent persoanel allo^cation decisions. Two majoi" 

decision points wiiLl be. investigated. The first point is 



^ the decision at the end of training whether to pass a 
recruit out of training, or to recycle .him o,r her into 
additional training f.or the same or some other MOS, Or to 
drop him or -her from'-the Army, al tog ther ."^ At this poiiit 



f 

\ 



both -pre- induction and training performance measures witl 
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be used to predict subsequent performance in the MOS,' The 

second, major decision point is at the completion of each 

soldier's first . toUr. At this point the Army must decide 

whether to encourage or to bar the soldier's r een 1 i s tmen t- 

for ,a second tour. Here first-tour performance measures. 

must be used, along with the training and pre- indue t ion 

measures^ to predict second-t6ur performance. Thus, valid 

4 

training and 'first-tour performance measures are needed 
both as criteria for the- Val idat Ion of earlier predicti9n 
measures and as predictor^ o f- subsequent criteria. 

In 6rder to accomplish these objecti^s, the project 
is organized into five <hajor tasks. These are: 

Task 1: Validation 

Task 2: Prediction of Job Performance 

Ta^k 3: Measurement of School/Training Success 

Tasil^ 4: Assessment of Ariny-wide Performance 

Task 5: Deve^^opment of MOS-Specific Performance 

\ 

Measures \ 

In the course of developing these- n€;w or improvecj 
measures, there will be pilot fieljd tests in order to 
assess the psychomejtr ic characteristics of the measuring 
instruments and the dase of their administration. Based on 
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these pflot tests^ the instrirments may be revised and then* 
empJLoyed in a large--scale j^ield test to collect data Lor 
two purposes.' The first purpose will4!)e to evaluate 
formally the e f fee t i veness the experimental set of 
pre- Ipduc t ion predigtors for predicting success in the 
Army. The sercond purpose will be ta examine the practica}^^ 
value of using early performance measures as additional 
predictors of later performances. The results of these 
evaluations will* be employed to guide further refinements 

^ 4 

of the. experimental measures. Through this iterative 
dev-elopment i^and refir^^ment process^ ck final set of 
predictor and performance measures will be selected and 
admiaistered to a cohort of enlisted personnel. The data 
collected in these adm i n i s t r a t i ons wi 1 1 be analyzed fo^the 
validation of the classification battery^ to be employed in 
the Army's new selection and c 1 ass i f i c a t i on sys tem . 



In conj ijinc tion with this complete, long i tu^d-ir^al 
validation of the classification measures, predicti<!^n 
models of enlistees' performance in the Army will be\ 
obtained to generate num<erioal inputs into the EPAS f^r 
^determining optimal pers^on-job matches. The development of 
a ^dynamic allocation procedure will be most useful to the 
Army if it uses inforiaatlon accumulated through the early 
peViod of the enlisteje's career to modify post-enlistment 
decisions at various choice points (post-training 
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reassignment, pfromotion, and reenl i stmen t) • The capability 

■ . ; . r 

of the EPAS can be enhanced by i nocvcpor a t ing such a dynamic 
decision process. EPAS would then be used to assist in ^ 
personnel decisj^^ns beyond the initial seleo^tion and 
training assignment npade at the MEPS level* A 

\ ■ , ■ ' ^ ' 

1.2. R5 1 e o f the ' Long i tud inal < Research Database (LRDB) . 



V 



Cleaply, the research proces's of Project A will 
generate a large amount of Interrelated data* that must be 
assemtj^ed into an integrated data base that can be accessed 
easily by j^he researjch teams for various analytical 
purposes. Therefore, one of th^ major tasks In Project A 
is to establish and maintain the longitudinal research data 
base (LRDS) . This data base will link together data on 

ft \ 

' »r 

diverse measures gathered la the various tasks of Pro^ject A 
and,\in addition, incorporate existing data tbat are 
routinely collected by the Army* Such a comprehensive LRDB 
will enable»Project A to conduct a full analysis of how 

V 

information gathered at each stage of the enlistee's ^ 
progress through his/her Army career can add to the 
accuiracy of predicting later per f ojnances . 

The richness of the LRDB to be created for the project 
will not only facilitate efficient validation analyses that 
concern Project A, but will also enable Project B to test 
) ' ' . . 

16 



and revise the. prototype selection/allocation system'. *' 
Specifically, Proje(j:t B will employ |ata on the training 

time, subsequent performance, and the utility of subsequent 

# 

performance levels to develop the classi f icatic^n model and. 
estimate required parameters. 

Intieed, th6 function of the LRDB extends beyond 
Projects A and B. The data base can also support other 
research work to be conducted by the ARI staff to" address 
specific f>olicy issues that may arise. , 

; { 

"1.3, Ov'erview of LROB Contents 

In accordance with the Project A Research Plan, three ' 
major sets of data will be assembled within the LRDB. The 
first set will consist of al ready" ex istirig data on FY81/82 
acc'es3ions. Tljese data will include accession information 
(demographic/biographical data, test scores, and enlistment 
options) , traihing success measures, measures of progress 
or attrition taken from the Enlisted Masterfile (EMF) , and 
specific information on Skills Qualification Test (SQT) 
scores. Thi^ first set of data will be employed to 
validate the current version of theArmed Services 
vocational Aptitude Battery (ASVAB) insofar as that can be 
done with available criteria. (This cohort was the'first 
to, receive Forms 8, 9, and 10 of the ASVAB.). 
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Recommendations will be made tor revisions in the ASVAB 
Area Composit;e scores to be used^jn classification 
decisions uhtil EPAS becomes fully functional. These 
analyses of the existing battery will also (:)roduce 
r ^commendat ionS for needed additions to the predictor 
battery- Furthermore, the investigation of methodological 
and conceptual issues that have plagued personnel decision 
research will begin with the data on this cohort so thaj: 
practical solutions may be devised for the val idat ion 
analyses en two subsequent cohorts. (See Task 1 of the 
Project A Research Plan.) / 

Tlfe second aid third sets of data to.be assembled* into 

the LRDB will invalve substantial new data collection 

efforts, in addition to the routinely collected data 

described above. The second set of data will consist of 

longitudinal information on FY83/84 inductees^ This 

information will be acquired in three data collection 

phases: . ' 

% 

(1) Beginning in the summer of 1983 at>d continuing 

until summer of 1984, samples of recruits will" be 
administered a preliminary predictor battery 

^ (consisting of available tests that are not 

currently employed by the Army but that have 

I 

potential value for predicting performance on 

IS.. 

' 8 • 
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ArTny job3). These data will be analyzed to 
determine the incremental validity of the new 
tests (tests such as vocational interest and 
motivation scales) ove» the existing predjictors 
(in^essence^ the ASVAB scores). .The results of 
thts evaluation will help guide the development 
of new. preinduction pred.ictors, (They will be 
developed to be parallel. to the preliminary 
predictors that are found to be effective for 
predial ting per f ormance . ) 

(2) Later in their first tour (June 1985 to October 
1985) f data on a revised set of predictors^ job 
knowledge tests^ and ^my-wide and MOS-spec i f i c 
per form ance measures under development by Project 
A will be obtained ftom this sample. These data^ 
together with the existing , data on current 
predictors and school performance indicators^ 
will be employed to conduct a concurrent 
validation of the initial predictors using both 
school measures and Subsequent per fo rm ance as 
criteria. The school measures will also be 
validated as predictors of' subsequent 
performance* The findings from this concurrent, 
validation analysis will provide the basis for 
revision and improvement of^ new Instruments and 
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for choosing^the most cost-effective of them for 
inclusion in the final set of classification^, 
meffsures. . ( 

(3) For members of this sample who reenlist^ 

♦ 

^ Army-wide and MOS--spec i f ic performar^ce measures 

will be col J.ec ted . dur ing their second tour (June 

I 

1988 to September 1988) and merged with existing 
EMF measures, ^ These data will be used to 
validate the pre- indue tion selection measures and 
early performance in the Army as predictors of 
second-tour performance. 

Once the new mjeasures are refined on the basis of the 
analyses of the FY83/84 cohort datar they will be 
administered to a new cohort (FY86/87 inductees) to allow a 
complete/ longitudinal validation of the final 
classification battery; The data for this final validation 
will also be collected in three phases. Briefly^ the three 
data collection points are:. 

(1) From March 1986 until February 1987^ samples of 
recruits from the^ 19- focal MOS wi 11 be tested 
with the revised predictor battery^ and their 
school data will be obtained* 

' 20 ■ 
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(2) During their first terjn of enlistment (June 1988 

f 

to September 1988), the kample will be followed 



up and the Army-wide and MOS-specific performance 



measures will be obtained. These data will not 



only support the predictive validation of the' 



classification measures but will al^o permit 



analysis of criterion equivalence between 



training and first-tour performan^:^ and between 
Army-wide and job-specific performance. 



(3) Similarly, during their segond tour (January 1991 
to March 1991), Army-wide and MOS-specific 
performance measures 'Vill agaijm be obtained from 
this sample. These data will be used for the 
^ longitudinal validation of the predictor meas-ures 

and the investigation of criterion equivalence 
between first-tour and "second-tqur performance 
measures. 

1.4. Specific Objectives of the LRDB Plan 

As pointed out in the preceding section, the primary 
role of the LRDB is to support e ffic lent . data analyses as 
required by the "research teams of^both Projects A and B. 
To fulfill this role, the LRDB must be created and 
maintained in coordination with the data collection 




activities and the research process. The data collected 
throughout the research process of Project A and the data 
to be acquired from existing Army files mu^t be organized 
and stored In such a \^a^ that they are slmple^^d 
economical to access. Accordingly, the LRDB plan must meet 
the following objectives: 



(1) To develop systematic and efficient procedures' 
for entering and editing the data. 

(2) To establish linkages of data from various 

J" • ■ 

sources and resolve all data inconsistencies. 

(3) To develbp and maintain complete documentation of 
the data o rg an i t and contents. 



^ — •» 



(4) To store both the data and the documentation 
cost-effectively and to provide fast and easy 
access to both ^ imiil taneo usl y . 

f 

(5) To insure' the gecuyity and integrity of the data 



J 



J 
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2, CON f ENTS OF THE LRDB 

The adequacy of the LRDB depends heavily on the 
specific variables that are included. It is not 
sufficient, for example", to specify that "training 
measures" will be included. The planning and conduct of 
the validation analyses require knowledge of the specific 
predictor and crit*erion measures that will be available and 
of the data elements that provide qualifying information. 
Unfortunately, the degree to vhich specific variables can 
be listed varies widely for the three main cohorts. For 
the FY81/82 cohort, most of the specific variables that 
will be available are now known. For the FY83/84 and 
FY86/87 cohorts, a great deal of work, will- go' int& defining 
and collecting new items of information. It is not now 
possible to give a complete list of specifics at this time. 

• ( ' ' . 

What follows is a listing of specific data elements to 

be included in the data base. This list should be helpful 

to all project statl in planning both analyses and future . ■ 

data collections. 

1 

Var iable names. Before proceed ingV however ^ a word is 
in ordeY on conventions and standards regarding variable 
names. There is a wide range of possible naming 
conventions, ranging from a strict numbering (e.g., VAR129) 




to acronym (^^ventlons (e.g., HT iKl^he iq ht in'^inches) to 
single word descriptors {e.q.r HEIGHT) • The naming 
convention to be used in this project will combine (a) a 
two character prefix indicating data s6urce with/(b) a 
descriptive label of up to six' chara^ers. (Note that 
eight characters is the maximum variable ham^ length in 
most statistical packages,) 

The two characters indicating data source will consist 

of an initial character^ designating the type of data, 

♦ 

followed by a sequencing character (1 through 9 anc^ then A 

throOgh Z) • The type of data codes are: 

A - existing applicant/accession data 
(including existing predictors) 

B - new predictor-battery data 

E - Enlisted Masterfile data (including 

ex i s t i ng Army-wide per f o rmance measures) 

G - new general (Army-wide) performance data 

P - ex is ting MOS--spec ific performance data 
(e.g., SQT) 

M - new MOS-specific performance data 

T - ex ist ing school/ tr a in ing data 

S - new school/training data 
Additional codes may be defined for derived variables that 
combine data from different source^. 
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lr\ casQS where data are ejc t rac ted from ex 1st i n(J ' 

y 

cjatafiles^ the established variable name will be used as 
the .descriptive portion of the variable name in our system 
(In characters 3 through 8). EMF var labl e ^names , for 
fexamplBv ar9 JLlmited to five charactet^i and .thus fit nicely 
into the system. In other casesr it may be necessary to 
shorten the name to six characters. WheVe appropriate, 
more obvious mnemonics may replace the variable name in-^the 
o r ig 1 n a 1 f i 1 e . 

2.1. Data Elements, for the 81/82 Cohort 

^ .^■'.^-m^-.mm . „ ^ . ■ ■ ^ ' -.. ..-. W 

•> 

t 

-4 • » 

2.1.1. 81/82 APPLICATION AND ACCESSION DATA 

The Arm/ collects a great deal of information ort each 
soldier, at the time that he/she applies to and is accepted 
into the Army. Some of this information is retained in ^ 
each soldier's permanent computerized records (the Enlisted 
Masterfile), but much i/s Hot. Some information, suchi as 
responses to individual AS^AB items, i's not retained in 
machine-readable form at 'all. 

For the most part, analyses of the FY81-82 cohort will 
be limited to data that are a?jready in machi ne- readable 
form. One exception will be information used in .the 



15 

25 



^Military Applicant ProUle (MAP)- score. Tho MAP Km a * 

f — 

t)attery of behavioral ind ica tors' now adm i n I h te'r ed to ail 
appl icants^who have not completed hl^h school. It has been 
'discovered that the overall profile score has not been 
included in th© compute r i ze(3 accession file. The answer 
sheets^ Including responses to the approximately (SO 
( noncogni t lye) items that comprjise the pjrofile, have beer^ 
retained at ARI and are available for entry. We plan-to 
enter this information for a sample of about 2,500 
applicants to allow^.^t)r analyses of the current^MAP itefns. 

The following Items of information will be taken from 
the existing accessions file and retained in the data base: 

r 

2,1, K l. Basic Identifying Information 

The data will be used to link -the accession d^ta to 
the EMF data. The linking variables will NOT be stored on 
the main data files : only a scrambled identi f i.e;: wi 1 1 be 
retained on the main data files for linking new^ata. The 
data needed for linking and checking the validity of the 
1 inkage are : 

f * 
AISSN SOC.SEC.no. 
. A1NAME4 4 CHAR NAME ABBREVIATION 

AINAME FULL NAME 

s • ' ■ . • 

AIDOB DA,Y OF- BIRTH 

T ' t 

i« 26 ' 
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2>1.1>2 . Individual Background Data 



These data will be used to identify differences in 

backgrounds that may be predictive of performance 

differences. In addition, certain variables may be 

valuable as moderators^ predicting differences in the 

relationships between predictor and criterion variables. 

♦ * 

other variables such as ZIP codes, will(^^rmit links to 
other (e.g., Census) data files containing i nformation/on 
local geographic or economic conditions. 



■ AlHOMADD 
AlHOMZIP 
AlPRSADD 
A 1 PR S ZIP 

AIMARST 
AINRDEP 
AIYOB 
AIMOB 



STATE/COUNTY CQDE OF HOME ADDRESS 
HOME ZIP CODE 

STATE/COUNTY CODE OF PRESENT ADDRESS 



ZIP CODE OF PRESENT ADDRESJ 



MARITAL STATUS 



NUMBER OF DEPENDENTS 



YEAR OF BIRTH 

r 

MONTH OF BIRTH 



AlCITIZ 
AISEX 
AIRACE ^ 
AlETHNIC 
AIRELIG 



CITIZENSHIP 
SEX 

POPULATION GhOUP 
ETHNIC GROUP 
RELIGIOUS PREFERENCE 
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AIHGT 
AlWGT 
AlPULHES 



HEIGHT 
WETIGHT 

PHYSICAL PROFILE \ 



AlEDYRS YE>ARS OF ' EDUCATION 

AlEDCERT * • EDUCATION CERTIFICATION 



2.I.I.3. Enlistment Information 

k 

I 

These data describe the timing and conditions of 
enlistment. This -information is of primary importance in 
the development of forecasting modelsby Project B. In 
addition, some of these variables (e.g., entry date, 
primary MOS, pay c^rade) provide the starting points against 
which the relationship, between the test scores and progress 
will be charted. Other variables (e.g., moral waivers, 
additional skill - indicators) will be useful as additional 
predictors 





% 




AlENTDTE 


ENTRY DATE 




AlPADDTE 


PROJECTED iBiCT. DUTY 


DATE 


AlAITGRD 


AIT GRADUATION QATE 




AlAAAS 


ACCESSION TO ACTIVE 


Army strength 


AlUPSTAT 


STATUS CODE 


\ 


AlACTDTE 


' ' DATE OF ACTION " 




AISERV 


BRANCH OP SERVICE * 




\ 

J* 
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aiprmmJs 
ait'rnmos 

AlENLOP 

AlEKlOPTl 

AIDESGOP 

AlENTPRG 

AlENLTRM 

AlENTST 

AIBONLVL 

AlAfeGRD 

AlPAYGRD 

AIGRDDTE 

AlASI 

AlWAlVER 

AIMORWVR 

AlWAPLVL 

AlAFEES 



PRIMARY MOS 



MOS 



TRAININ 
ENLISTM 
ENLISTED OPTION 



BNT OPTION GUARANTEED 

V 



DESIGNATED OPTION 

PROGRAM FOR WHICH ENLISTED 

TERM OF ENLISTMENT 

ENTRY STATUS 

ENLISTMENT BONUS LEVEL 

ABBREVIATED GRADE CODE 

PAY GRADE 

DATE OF GRADE ^ 
A D D I T i 0 N A L ; S k I L L INDICATOR' 
WAIVER TYPE ' ^ " > 

REASON FOR MORAL WAIVER 
WAIVER APPROVAL LEVEL • 
AFEES IDENTlFlCATTON 



e » 



2.1.1 .4. Prior Mi litary Experience 



AIYTHPRG 
AlCONDBY 
AIYRSCMP 
AlPRlSRV 
AlSR\feRK 



youtA program 

youth prog. conducted -by '''' 

no. of years completed in *'0uth 

PRi6r SERVICE , 
BREAK IN PRIOR SERVICE 



19 



29 



2.1,1,5, Delayed Entry Prog ram ^ (PEP) Informati on (^^^ 



AIDEPDTE 


DEP DATE 


AlENrDT2 


• DATE OF CONTRACT/ENTRY 


AIDISDTE 


ENTRY OR DISCHARGE DATE 


♦ 

AlEljITRST 


PUTRY STATUS 


AlDEPPG 

1 


PROGRAM FOR WHICH ENLISTED 


AIDESDEP - 


. DESlGNAT^ED OPTION 


AlOPTBEP 


ENLISTMENT OPTION 


-Mtngmos 


TRAINING MO^ 


AINODEPR 


DEP NON-ENLISTMENT REASON 



2,1,1,6, Hometown Recrufter Aide Progra m (jiRAP) 



In format ion 

-B 



AJHRAP 
AlHRAPtC 



HOM1ET0WN RECRUITER AIDE 
HRAP LOCATION 



^,1, l . T*, Te sting I nformation - f 



AlCYCL 
AIMCAT 
/ AITSITE 
AITSESS - 
' AIASVBFM 
AIAFQTPC 
. AIASVBXX 



DATb OF CYCLE NUMBER 
MENTAL CATEGORY 
TEST -SITE 
TEST SESSION 
ASVAB FORM 
AFQT, PERCENTILE 
ALI?- ASVAB SUBTEST SCORES 
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Current ASVAB Area Composite Scores 



A1A9CVGT ^ 

AIASCVGM 

AlASCVEL 

A1AS<^VCL ' 

AlASCVMM 

AIASCVSC 

AIASCVCO 

AIASCVFA 

AlASCVOF 

AIASCVST 

AIASCVWS 



GENERAL TECHNICAL 
GENERAL MAINTENANCE 
ELECTRONICS 
CLB*UCAL 

MECHANICAL MAINTENANCE 
SURVEILLANCE/COMMUNICATIONS 
COrtBAT 

FIELD ARTILLERY 
OPERATIONS/FOOD 
SKILLED TECHNICAL 
AFWST (WOMEN ONLY)' 



Previous ASVAB Subtest arwS- Composite Scores 



AIPASVFM . 


PREVIOUS 


ASVAB 


TEST FORM . 


AIPAFQTS 


PREVIOUS 


ASVAB 


TEST-AFQT 

1 


AlPASVxk 


, PREVIOUS 


ASVAB 


SUBTEST SCORES 


AlPASCXX 


PREVIOUS 


ASVAB' 


CONPOSITE SCORES 
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2.1.2. TRAINING DATA 

ARI haa expended Considerable effort to collect 
training information on 1981 and some 1982 accessions* 

4 

These data indicate the timing and duration of training, 
the course (s) taken, the overall outcome, and some measure 
of perform'ance in the course. It is important to note that 
the natui;'e of these performance measures varies widely by 
school and sometimes by course or class within school. 

2^K2. 1> Bas ic Ide ntifying^ Information * 

These data will be used 'for identification purposes 
only and will NOT be stored on the main data files; only a 
scrambled identifier will be retained in the main data 
files for linking in new information. 

T1NAME5 5 CHAR ABBREVI Al^ION FOR NAME 

TISSN * SOCIAL SECURITY NUMBER . , 

\ 

2^ 1 . 2 . 2 . S c hool Ident i f ica tion Inf ormation 

These data will be used to identify the school, the 
clasSr and the course for which the scores *on each file 
h«ve been collected. 
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TISCHOOL- 
TICOURSE 
TIC LASS 
TlMOSAWD 
TISKLLVL 



SCHOOL/ATC CODE 

NAME OF COURSE » 

CLASS ID NUMBER WITHIN COURSE 

MOS AWARDED UPON COMPLETION 

MOS SKILL LEVEL AWARDED 



2.1.2. 3. Students' Progress Throug h the Training Pro9r 



am 



Essential to this project are the data which describe 



each student's progress through training. 



T1ENRDT& 
T1G«DDTE 

'TlATTRIT 
TlblSP 

T.lSCOREl STUDENT'S COURSE GRADE OR TEST SCORE 



TlSTYPEl 
T1SC0RE2 

TlSTyPE2 
TISELECT 

TIMORSE 



ENROLLMENT DATE 

DATE OF RECYCLE, TRANSFER, OR 
GRADUATION 

TYPE OF ATTRITION 

DISPOSITION (PASS, RECYCLE, TRANSFER 
OR DROP) 



TYPE OF SCORE 

f 

SECONDARY PERFORMANCE MEASURE 
(FOR SOME MOS) 

TYPE OF ADDITIONAL SCORE 

WAS SPECIFIC MOS GUARANTEED FOR 
BASIC INFANTRYMEN 



MORSE CODE TAKEN FOR 05B AND OSC 
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2.1.3. DA TA FROM THE ENLISTED MASTERFILE (EMF) 

The Army Enlisted Masterfile (EMF) contains a 
significant amount of information that is essential to 
Project A. In particular, informat-ion on each soldier's 
progress through his or her Army career is captured by the 
EMF. The EMF also contains important information on the 
individual background and enlistment conditions of each 
soldier that are important checks against similar 
information obtained from the ' Access i on files. 

While some of the analyses will focus on a specific 
MOS, others will require information on a broadly 

representative cohort of soldiefs. In particular, in ' • 

analyzing the gjener al i zab il i ty of results from samples of 

MOSr It is esseYit^al that such representative cohorts be 

> 

analysed. The EMF provides one source of information on 
the progress of all recruits, against which the results for 
specific samples can be Compared. 

The following 1 ist i nd icates the EMF data elements 

thafc will be needed by this project in order to avoid large 

f 

and redundant data collection costs. The variables are 
grouped into seven types of information, and the use of 
each type of information in the planned analyses is 
indicated. Basic and background information will be 

34 . 
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retained onl^ once 'in the -system. Other information, on 
progress and problems, will be obtained at regular 
Intervals^nd accumulated into the data base. 

2 .1.3. 1. Ba si c Ide ntifying Information 

Again, these data will bfe used to link new EMF 
information to data previously collected on the samples of 
soliders in th6 project. The linking variabl|ts will NOT be 
stored on the main data files. Only a scrambled identifier 
will be retained^ on the data files for linking in new 
infprroation. The EMf variables needed for linking and for 
checking the validity of the linkage are: 

EISSN SOCIAL SECURITY NUMBER 

EISSNPR PREVIOUS (INCORRECT) SSN 

' EISNCTL SSN CORRECTION DATE 

2.1.3.2. Individua l Backg round Data 

These data will be used to identify differences in 
backgrounds, that may bfe predictive of performance 
differences. in addition, certain key variables will be 
used to check the "fcultural fairness" of any proposed 
selection and classification algorithm. Note that most of 
this information will also be obtained from the accession 




Eiles. The corresponding EMF variables will be used to 
check bt verify the accession data. After completing this 
check, oniy one'copy of this information will be retained. 
The background data elements from the EMF that are needed 
to either add or verify essential information include: 



EISEX 


$EX 


ELIRACE 


POPULATION GROUP 


EIREDCAT 


RACIAL/ETHNIC DESCENT 


El'ETHNIC 


ETHNIC GROUP DESIGNATION 


EICLANG 


LANGUAGE IDENTITY 


EICITIZ 


CITIZENSHIP STATUS 


EIDOB 


DATE OF BIRTH 


ElMARST 


MARITAI^TATUS 


EINRDEP 


NUMBER DEPENDENTS 


EICIVED 


ACADEMIC EDUCATION LEVEL 


ElMADCD 


COLLEGE MAJOR 


EISTRD 


STATE OF RESIDENCE AT ENLISTMENT 



2.1.3.3/ Enlistment Conditions 

The enlistment data required by this project include 
physical and mental test scores and information on the 
terms or conditions of enlistment. The test scores are the 
primary predictive measures currently available. The 

\ ■ ' ' 



information on enlistment con(?itions is essential to 
understanding the r el a t i ons,h i p between the test scores and 
subsequent performance -in the Army. As wi th^ backg round 
data, much of the enlistment information will also be 
obtained from accession files. Again, only one copy of 
this information will be retained after any inconsistencies 
are resolved. The required enl i stment" var iables include: 

/ 



EIASVBXX 


ALL ASVAB AREA COMPOSITE SCORES 


EIAFQSC 


ARMED FORCES QUALIFICATION TEST SCORE 


ElAFQG 


AFQT GROUP 


EIDLAB 


DEFENSE LANGUAGE BATTERY 'sCORE 


EIPHYPR 


PHYSICAL PROFILE 


EIPHYCA 


PHYSICAL LIMITATION CATEGORY 


E IXFACT 




ElCOMPT 


SERVICE Component 


ElENLOP 


ENLISTMENT OPTION CODE 


* 

ElMORWA 


ENLISTED/REENLISTMENT WAIVER 


EITERMS 


TERMS OF SERVICE OR ENLISTMENT 


ElBASD 


BASIC ACTIVE SERV-ICE DATE. 


ElBONIN 


BONUS INDICATOR 


ElRPFLG 


RECRUITER FLAG (PROMOTED OR SEPARATED) 


EIRCRCD 


RECRUITER CODE 

* 


EIPLOEN 


STATE OF ENLISTMENT 


EITYPLA 


TYPE OP LAST ACCESSION 




t 
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EIDATLA DATE OF LAST ACCESSION 

ElETSDT DATE OF EXPIRATION OF LAST TERM OF SERVICE 



2,1.3.4. 5a3i c Progress in the Army 

A major outcome to be predicted at the time of 
selection is the appl ican t ' s probabl e r a te of basic 
progress in the Army. Several EMF variables are needed to 
chart the progress of the soldiers in the research samples 
for use in validating new and existing predictor measures. 
These Include: 



EIGRTIT 

EIDOR 

EIPAYGR 

EIPAYSX 

ElGRDDT 

ElBPEDT 

EIGRDTT 

EINCOES 

ElPROPT 

eip'ropdt 

BIPRVPT 
^ EIPRVPDT 
EIPROPA 




GRADE IN WHICH SERVING 

DATE OF RANK 

PAYGRADE 

PAYGRADE & SEX 

DATE OF LAST GRADE CHANGE 

BASIC PAY ENTRY DATE 



TYPE OF LAST GRADE CHANGE 

NCO EDUCATION SYSTEM (LEVEL ATTAINED) 

CURRENT PROMOTION POINTS 

CURRENT -PROMOTION POINT DATE 

PREVIOUS PROMOTION POINTS 

t^REVIOUS PROMOTION POINTS DATE 

PROFICIENCY PAY STATUS 



1 * 
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-El/^ITDT 

EIPACE 

ElEERWA 

EITUREL 

f 

EISECCLR 
EISGTID 
ElADPAY 
ElVEAP 



AIT GRADUATION DATE 
SELF-PACED AIT FLAG 
EER WEIGHTED 'XVERAGE 

TOUR Eligibility 

PERSONNEL SECURITY CLEARANCE 

DRILL SE-RGEANT QUALIFICATION 

ELIGIBILITY FOR ADDITIONAL PAY 

VETERANS EDUCATION ASSISTANCE PROGRAM CODE 



2,1.3.5 v Performance in a Par ticl^l jar -.MOS 

Since much of military performance Is specific to 
particular occupational specialties, many of the criteria 
used in evaluating new and existing predictor measures will 
concern pFogress and perfotmance within an MOS. The 
specific EMF variables required to track this information 



are : 



EICMF 



CAREER MANAGEMENT FIELD 



EIPRMOS PRIMARY MOS 



EIDMOS^ 



DUTY MOS 



E1SM0S3 ' SECONDARY MOS CURRENT (3-POS) | 

EIPMOTT TYPE OF LAST PMOS CHANGE 
{ 

EIPMODT DATE OF LAST CHANGE TO PMOS 
EIPGMOS PRIMARY PROGRESSION f^OS" 
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TSIBOMOS 
EIDDSID 
ElADS ID2 
ElADSIDB 
EIPQDES - 
EIPQSCR 
EIPQPER 
El PMOST 
EIPSQDT 
ElPMOSTl 
E1PM0ST2 
EIPRQDT 
EIPRDES 
EIPRQSC 
EIPRPER 
^ElSQDES 
EISSQDT 
EISQSCR 



MOS OF BONUS ' 

ADDITIOMAL SK^LL INDICATOR, DUTY MOS ' 

Mr 

ADDITIONAL SKILL INDICATOR, PREVIOUS 
ADDITIONAL SKILL INDICATOR, 2ND PREVIOUS 
PRIMARY MOS, IN||^HICH TESTED, SQ DESIGNATOR 
PRIMARY SQT SCORE (FOR PQDES) ^ 
SKILL QUALIFICATION PERCENTILE (FOR PQDES) 
PRI^^Y MOS IN WHICH TESTED 
DATE OF L-AST CHANGE ON PMOS VESTED (SQT) 
PRIMARY tjpS IN WHICH TESTED, FIRST PRIQFf 
PRIMARY MOS IN WHICH TESTED, SECOND PRIOR 
DATE OF PREVmUS CHANGE IN PMOS TESTED 
PREVIOUS PRIMARY MOS IN WHICH TESTED 
SQT SCORE FOR PREVIOUS MOS (PQDES) ' 
PREVIOUS SQT PERCENTILE (FOR PQDES) 
SEC6NDAIIY MOS SQT 
SMOS SQT DATE 
SMOS SQT SCORE 
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^*1>3^6^ Ind icator s of Attrition and Related Probl 



ems 



^In addition to using measures of Army-wlde and 



MOS-specific .progress as criteria, it will also be 
essential to predict the potential for problems within the 
Army. In particular, early attrition from tiie Army for 



reasons of conduct or low performance represents an outcome 
*hat must serve as a negative Qrlteri6n in the validation 
of predictor measures. The specific EMF variables required 
are: * " . * 



EICHSEP 
^EISPNIS 
EISEPT-^ 

eis|:pdt 

♦EIDFRDT 
ElDFJj^l'T 



EIS^ 



u- 



E15TATT 



'J' 



EILAWTT 

EILAWDT" 

ElAWODT 

ElAWOTT 

EIRMCTT 



character of separation 

se;paf?^tion program designator 

typ^ of last separation 

date of last separatiolj " 

pate of last drop from rolls 

type of i^kst 'drop from jiolls 

status of last status code change 

tyeve of last status code change 

type of last awol transaction 

date of last awol transaction 

DA'if'E OF RETURN FI^M LAST AWOL ^ ' 

TYPE OF LA?? BJETURN FROM AWOL 

TYPE OF LAST RETURN TO MILITARY CONTROL 



2, 1.3.7. Reenl istment Eligib ility and 'Conditions 

♦ 

tl A final indicator of each sold ler ' s value to the Army 

i^ whether the soldier is.elirgibl^ for reenlistment and in 
fact does reenlist. The spec ific EMF variables required to 
capture this . information include: 



ElEREUP 

ElEREUPP 

ElVRPMO 

ElVRMUL 

ElVRGRD 

ElVRRDT^ 

ElVRpNR 

ElVRTRM 

EIPSVCI 



REENl^ISTMENT ELIGIBILITY 

t 

REENLISTMEtJT ELIGIBILITY BATR 
SELECTIVE REENLISTMENT BONUS MOS 
SELECTIVE REENLISTMENT BONUS MULTIPLIER 

\ 

SELECTIVE REENLISTMENT BONUS PAY GRADE 
ENLISTMENT/REENLISTMENT BONUS DATE 

1 

ENLISTMENT/REENLISTMENT BONUS PAYMENT NO''. 
ENLISTMENT/REENLISTMENT BONUS PAYMENT TERM 
NUMBER OF TIMES ENLI STED/REENLISTED 



/ 



.1 



2.1.4 



SKILLS QUAL IFICAT ION TEST (SQ T ) DATA 



Special datafiles will be obtained containing SQT 
score information for soldier? in the FY81/82 accession 
cohort. These data will significantly "expand the SQT 
information available in the EMF by'adding scores on tests 
no t ^ released for operational use and by adding info rmat ion 
oo the' particular ^orm (skill levels test year and track), 
completed by each soldier. The specific data elements'to 
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fct« Included, apart from identifying information used only 
for linkage, are: I 



PIMOS MOS IN WHICH TESTED , 

PISKLLVL SKILL LEVEL TESTED 

Pl^RACK FORM OF SQT MT THIS LEVEL 

PlYEAR SQT TEST YEAR 

PITESTDT DATE OF TESTING 

PISQTSCR ~ SQT SCORE 

2.2. FY&3/84 Cohort Data 



The data asstambled for the FY83/84 cohort will Include 
all of the data assembled for the FY81/82 cohort from 
existing sour.ces plus a considerable number of new measures 
developed by the project. It is not possible to specify 
the* exact variables at this time, but a summary of these 
n^ew measures is included below* 



2,2,1, INITIAL PREDICTOR DATA 

All of the application and acqesSion variables 

N. 

collected for the FY81/82 cohort will also be assembled for 
the FY83/84 p.ohort. One significamt change in t^hese data 
is that Forms 11, 12, and 13 of the'l^'SVAB will have been 
introduced. In addition. Task 2'will develop and * 
administer batteries of additional predictor measures. 

f 
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2.2«1,I. Preliminary Battery 

t 

?t preliminary battery of predictor measures will be 

administered to special sainples of about 2,100-4,600. 
FY83/84 accessions^ f.rora October 1983*to. June 1984 ir> each 
of four MOS: 



MPS 
0 5C 
19E/K 
63B 

71L 



Title 

Radio TT Operator 

Tank Crewman 

Vehicle and Generator" 
, Mechanic 

Admin ist rest i ve 
Specialist 



Training Si te 

Ft. Gordon', GA 

Ft. Knox, KY 

Ft. Dlx, NJ 

Ft. Leonard Wood, MO 

Ft. Jackson, SC 



These data will be collected during the first week'of the 
soldiers.' advanced training "(AIT) . Training school 
ach'ievemeijt measures (developed tn Task 3) will also be 
collected as enlistees pass through these training courses 
and will be used as crjl.teria in the initial analysis of 
Preliminary Battery measures. 

This preliminary battery will focus on types of 
predictors not currently in use. Analysis of these 
measures will allow an e^rly determination of the iftp>or 
human attributes r\ot assessed by the current pre- indue t ion 
battery, and whether the measurement of these attributes 
significantly Increases the accuracy with which performance 
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is predicted. This information will be useful for guiding 
the development of new predictors into areas most likely- to 
increase the accuracy of prediction and cTas^lftcation. 

r 

The Preliminary Battery must necessarily be made up of 
of f- the-shel f ••^ Instruments, because there is too little 
time prior to the scheduled administration of the 
Preliminary Battery to develop and pilot test new measures 
of constructs deemed potentially useful- The testing will 
prbbably be done Within a four- hour time, si nee so Idler 
time at AIT schools is generally allocated in four-hour 
blocks. That time period is sufficient to administer 
of f- the-shel f measures' of biographical information/ 
vocat iona^^^nter est , motivation, anc? cognitive ability. 
Psychomotor measures "wi 1 1 probably not be included in the 
Preliminary ^Battery because of the time constraints. 

2.2.1>2> Tr iaA l^redictor Battery 

A trial batt^y of -predictor measures, following pilot 
testing for practice effects, fakeability, and motivational 
set (with the pilot test administered to samples of the 
FY81/82 cohort) will be administered tcf ah average of 500 

V 

V ■ 

soldiers in each of the 19 MOS. These dat^ will be 
colle,cted bet'w^n June and October 1985 from FY83/84 cohort 
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ft^embers who will generally be in the third year of their 
first term of Enlistment- Job-per formance criterion data 
will be collected for these same soldiers for use in a 
concurrent validation of the Trial Battery* The current 
plan is to develop a, Trial Battery that' will require a 
maximum of four hours to admini3"ter, including 
computer-admipistered and apparatus measures. 

In addition to the collection of these primary data, 

four research projects will be undertaken. First, to 

measure test-retest reliability, the predictor battery will 

be readmin istered to a sub sample of 500 soldiers 30 days 

after the initial administration. SecontJ, to measure 

practice effects, a subsample of 115 soldier^ will be 

re'administered the battery in the week following the fir«t 

testing. Third, to measure fak^ability, 115 soldiers will 

be instructed to "fake good" and another-115 soldiers will 

be instructed to "fake bad" oh the non-cognitive portions 

of the paper-and^pencil battery* Finally^ to measure score 
« 

differences between "early career" soldiers (i.e., new 
r ecrui ts) and the pr imary sample (later career soldiers) in 
examining maturational effects^ a sample of 1^000 new 
recruits will recei the battery. ^ . . 




f 
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2.2,2. TRAINING MEASURES 

Currently available training meaguces will be obtained 
from the school records for input into the LRDB. In 
'addition, scores from job knowledge tests, MOS content 
tests, performance ratings, and end-of-course knowledge 
tests (EOCKT) will be added to the file. 

> 

2.2.2.1. Av ailabl e Record s 

Training performance measures thaj^ have been 
identified in Task 3 as adequate indicators, based on 
interview data and on qualitative analyses, will be 
obtained from school records for all recruits in-the 
FY83/84 cohort who receive training in one of the 19 mOS 
selected for this research project. Task 3 staff will 
arrange for the school to provide the required training 
data to be input into the LRDB on a continuing basis from 
July 1983 to September 1984, as each new class completes 
training. 



I 
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2,2,2 ,3. Prototype Measures 

Preliminary and revised prototype performance measures 
will be Administered t-o samples averaging 575 soldiers from 
four MOS: 05G, 19E/K, 111, Different test formats 

will be examined, Including free response measures and 
synthetic hands-on performance measures* In addition, 
measures of general performance in training and new indices 
from existing measures will be obtained for samples from 
all 10 MOS, The data collected, on these prototype measures 
will be analyzed to determine the relative feasibility and 
value of the administration of each type of measure. 

2>2,2,4, End-of-Course Knowledge Tests (EOCKT) 



Revised EOCKT will be gathered on samples averaging 



obtained for the FY83/84 cohort. 



500 soldiers from the 19 MOS* 



the same time that the other 




measures a re 



obtained at 
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2,2,3, FIRST TOUR PERFORMANCE MEASURES 

Concurrent with the administration of the Trial 

J 

\ 

Predictor battery (durdng the latter half of 1985, see 
section 2.2.1.2), Army-wide> performance measures will be 
collected from the same 19 MOS samples. These data will 
include rating scale measures and behavioral indices 
generated from records of commendations, disciplinary 
problems, and attrition. F6 r half of these samples (9 
MOS), MOS specific performance measures will also be 
administered. The tentative list of MOS includes the 
following: 



IIB 


Infantryman 


13B 


Cannon Crewman 


19E/K 


Tank Crewman 


0 5C 


Radio TT Operator 


63B 


Vehicle and Generator Mechanic 


64C 


Motor Transport Operator 


71L 


Administrative Specialist 


918 


Me(iical Care Specialist. 


958 


Mi(y.tary Police 



The measures used will include hands-on task performance 
tests as well as job knowledge tests and supervisor and 
peer ratings. 
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2.2.4. SECOND tOUR PERFORMANCE MEASURES 

} 

Army-wide and MOS-speciflc performance measures will 
also be collected from the FY83/84 cohort during their 
second tour (Jun6 1988 through September 1988) . Samples of 
about 100 soldiers are expected for efech of 10 different, 
MOS {05C, 63B, 71L, 19E/K and 64C, 76Y, 91B, 94B, IIB, 13B) 
for which first tour MOS-specific performance measures are 
obtained. The measures will be revised versions of the 
first tour performance measures. 



The data collected on the FY86/87 cohort will be 
parallel to the data collected on the Fy83/84 cohort, 
except that concurrent predictor measures will not be 
collected on the FY86/87 cohort. Data from existing 
accession and EMF records will be gathered along with data 
from the predictor and criterion measures developed by this 
project. ^ 

2.3.1. EXPERIMENTAL PREDICTOR BATTERY < 

From March 1986 through February 1987, the revised 
predictor battery will be administered to samples of 
recruits at the beginning of AIT. Current plans call for 



2.3. 



FY86/67 Cohort Data 
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testing an average of 2,200 recruits in each of the 19 

<* 

focal MO'S. (Data will be collected from additional MOS if 

&rellminAl::y_„arLal-y^^-4«dic that other MOS are required 

to assure sufficient validity generalization.) 

2.3.2. TRAINING DATA ^ 

. 1 
Training performance data will be obtained frqm 

schools for the FY86/87 cohort sample who receive training 

in the 19 focal MOS between March 1986 through May 1987. 

The measures collected will include the EOCKT as well as 

those prototypical measures that prove feasible and valid. 

2. 3«3. ARMY-WIDE PE RFORMANCE MEASURES y' 
. 1^ ' ' 

The Army-wide (Task 4) and the MOS-specific (Task 5) 
performance measures administered to the FY83/84 cohort 
will be revised on the basis of analyses of these data. 
The revised performance measures will be administered to 
a;>alogous samples of the FY86/87 cohort for use as final 
^y/^al idation criteria. 



&1 
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2,3>4« SECOND TOUR DATA 

Again, revi-sed Army-wide and MOS-specific performance 
measures will also be administered to samples of the 
FY86/87 cohort who remain for a second tour of duty. 
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3> EDITING SPE CIFICATIONS * 

For each 3Qt of data to be errt^red into the data base, 
a detailed set of editing ; specif ications .will be' developed , 
reviewed, revisedV and iixfpl emented . These specifications 
will give procedure^ for linking the ^i^^da^ta to- existing 
records, identifying erroneous or improbable values, 

correcting these values, and replacLng missifig values where 

\ ' 

appropriate. Editing specifications for the FY81/82 cohort 
training data are given below as an example, 

3.1, Editing Specifications for FY8.1/82 Training Data 

3.1,1. LINKAGE TO OTHER FILE^ 

^ ^ -r ^ 

\ 

PriojT to the detailed editing of each field,' the 1981 
training data 'will be linked to the FY81/82 Accession data 
file and to the 1982 year-end EMF file. The reason for 
this prior linkage is two-fold. Firsts the 1981 training 
data f ile ^contains records on -some number of soldieVs who 
are not of interest to the current study. These include 
soldiers not in the regular Army, soldiers who actually 
entered prior to FYei , * and soldiers who are actually 
reenlistees. By eliminating these soldiers first, editing 
resources can be concentrated on the cases of primary 

r 
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^interest. The second reason for pr io r .rlnkag e is that the 
In^orjnatlon from the AccesSiion *flnd EMF files will provide 
important checks on the reasonableness of the training dat^' 
fields and will provide information essential to the 
correction af missing or Invalid values. 



'Vhe llnkaNj^of additional data will be accomplished in 
two stages. The first stag^ will Involve matching the 
training records to a special ''Link** file which contains 
identifying JLnform'ation on the soldiers of Interests (See 
the discussion of the "Link" tile in Section 4.3.) For the 
training records th.at match a record in .the Link file, 
Identifying information ,wl 11 be stripped and replaced with 
the scrambled identifier from the Link fll^. This 
scrambled identifier serves as the primary key for matching 
data already in the data base* The second stage will be ta 
merge the training data with other InfoVm^itlon In the data 
base using the^ scrambled' identifier. , 

Two passes will be used in the Initial »matc>) to the 
Link file. The initial pass will match on SSN. For each 
training record which does not match a Link file record, a 
second match will be attempted. The purpose of this match 
is to identify err^ors in* the coding of SSNs*. In. this 
second pass, the training r,6cords will be matchcidi to the 
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Link file on the basis of the name field (actually the 
first 5 characters of name) and on MOS. (It is expected 
that each of .the initially unmatched training records may 
match many Link file r^ecords.) For each matc^y, the SSNs 
will be compared ar^d a new varfa>le, NMATCH, will be ^ 
. ^ computed as the number -of matching digits. A frequei 

^ distribution will be run on NMATCH- to determine an 

appropriate cutoff point for accepting a match." (\Ve 
• ^ currently expect to accept matches, with V^Or' more digit: 

/ common.) Accepted matches will have all identi-fying 

information replaced with the scrambled identifier and wil>l 
> be merged with the ipain datafile by ecrambled identifier. 

*; ■ ' .... 

In addiiiion^ a dummy re(?ord with the alternate SSN will be 
1^ "inserted into the Link file^ for use in future matches. 




3.1.2*' 



EL IMINATION OF DU PLICATE RECORDS 



ERIC 



\J^e training datafile is known to .contain both exact- 
duplicate records and also valid instances of multiple 
records for the same soldier due to recycling. The next 
.steip in the^ editing process will be to eliminate the exact 
- duplicates and create a pYCLENO variable (which numbers the 
training courses taken by an' individual soldier 
Sequentially beginning with 1 for the course with the 
earliest enrollment date) for other instances of multiple 

r 

4 
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records. The CYCLENO variable, together wijrh the soldUr's 
scrambled SSN, will uniquely identify each valid record in 
the training file. ^ 

The first step*, in th^is process is to eliminate all 
records where th4\ pr eced ing record contained identical 

^values for all fields of interest; in this case, ,all fields 
except name. After this has been\ compl e ted , " a second pass 
will be made to identify obviouslM valid recycles. The 
file will be sorted l|y ID and by TlGRDDTE 
(graduation/recycle c^e) . The f irst'^cord for each 
soldier will have CYCLENO spt to 1. Subsequent records 
will be accepted as valid and have the CYCLENO variable 

^ increased by 1 if the following conditions are met: 

(1) the disposition variable (TlDISP) 4n the 
preceding record has-a value of A or B (recycle 

^ or transfer) / \ 

(2) the TlGRDDTE variable for the cQrrent record i^ 
at least 10 days greater than the TlGRDDTE value 
£or >tTiie' preceding record. (A fre.quenCy 
distribution on fhis difference will be run to 
check the reasonableness of this .cutoff date.) 
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All duplicates not meeting these two criteria will be sent 
to an error file, printed, and inspected by hand for 
further resolution. it ,is expected that these records will 
either be true duplicates with data entry errors or i^id 
recycles with errors in TlDISP or TIGRDDTE. 



3kA._3 . IN pi VI DUAL F I'g LD^EDITS 

. The editing specifications for each field are given 
below. (See Section 2.1.2. for a list of the variable 
names to be edited.) In each case, error records are to be 
listed Ind ivi'dually and manually inspected for error 
resoUit ion . where a large number of errars occur in a 
given field, machine co,rrections will be developed as 
appr.opr late . I.ri e^ch cht* , a default procedure for the 
imputation of missing or invalid data is given. • ' 

I 

a. TlMOSAWD: values must match the list of valid MOS 
for this field. A c ross-^tabul ation of TlMOSAWD by AlTRNMOS 
(training MOS from the 'accession file) is to be run to 
resolve invalid values. For 'records where the TlMOSAWD 
code is invalid or missing and an EMF record has been 
linked, the EIPMOS and EIDMOS variables will also be used 



in error resolution. 



b. TISCHOOL: must be a valid school code fqr tljiis 



MOS 



c J' TICOURSE: must be a valid code for this school 
OlqA. MQlS.. : _ - - - : 

TlCLftSS: must be a valid code for this course and 

school • 



e. TISKLLVL: must be a valid code for this MOS. 

* f. TlENRDTE; must be a vall^ date, less than "* 
TIGRDDTE, and greater than or equal to AlENTDTE. A 
-d-i-sfer ibut Jk>n will be run art the number of d^iys between 
AlENTDTE and TlENRDTE to establish ^ appropriate cutoff* 
for the identification of outliers^* (Note that this edit 
may also catch errors in AlENTDTE*) In most cases^ 
TlENRDTE must be identical for specific course and class 
codes. In such cases^ the modal value will be substituted 
for missing or invalid values. > 

g. TIGRDDTE:" must be a v^ilid date and eater than 
TlENRDTE, Graduation date values will also be compared 
with the modal value among graduates of the same course and 
class. For recycles and attritions^ the value must be less 
or equal to the modal value except in the case of 
self-paced classes^ 
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h. TlDISP: must be a valid code for this field. A 
table of TlDISP by tIattrit will be examined to determine 
valid combinations. -.Basically, TIATTRIT should be blank 
for graduates (F, G, or H) and for progressive transfers 
(E) and nonblank for recycles, attrition transfers, and 
relief (A,B, or C) . 

. i. TIATTRIT: must be a valid code and consistent 

vAth TlDISP as specified above. If attrition is indicated 

I 

prior to 30 September 1982 and an EMF record is matched, 
the attrition code will be compared to ElCHSEP (character 
of separation) and lElSEPTT (type of separation) and 
TIGRDDTE compared t!.o EISEPDT separation dabe. " Frequencies 
and cross-tabulatiohs will be run to determine, which 
<:ombination3 are to be treated as errors. 

j. TlSCOREl: Frequency distributions will be run for 
each school, MOS awarded and course to determine cutoff 
values for the identification of outliers. For some MOS, 
the scores will be compared to the TlDISP and TIATTRIT 
values to assure that^scores are either missing, or below ^a 
cutoff value for recycles or academic attritions. Existing 
documentation and subsequent inquiries will be required to 
complete the specification of the treatment of the field 
and to i;reate a type of score value, TlSTYPEl, that allows 
for proper interpretation of this field. 
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ki. TISC0RE2: For certain MOS , a second score was 
recorded. This field will be created with appropriate 
analyses of outliers in accordance with existing 
documentation. A second score type variable, 'riSTYPE2v 
will also be created. Two additional variables will be 
generated from data initially in the T1SC0RE2 field. For 
MOS 05B and 0 5C , a variable TIMORSE will indicate 
completion of Morse code training. *For MOfi UX, the actual 
MOS awarded will be determined and a variable, TISELECT, 
createcL to indicate whether the awarded MOS had been 
originally guaranteed. 

3.1.4. MACHINE CORRECTION OR IMPOtATION 

After manual inspection of all error records, 
resolvable cases will be updated and the initiaT edit will 
be rerun. Fo r^ cases where missing or invalid values 
remain^ iirtputed values will be subst 1 tute^Sl . (Each variable 
imputed will also be fl^^agged with a binary flag so that 
imputed values can be identified ^ind^ If desi red / dele ted 
in later analyses.) 

For the categorical variables^ "predictor" variables 
are already indicated in the above consistency edits. In 
each case^ imputed values will be generated randomly with 



probabilities proportional to the conditional distribution 



of the variable in question (conditioned on the values of 
the predictor var iable ( s) ) . in many cases, this simply 
means substituting the one school code where this MOS is 



taught If the school code is missing, or making the course 
code consistent with thd school and MOS codds. In other 



cases, values may actually be generated probabilistically. 



For continuous variables (TlSCOREl and T1^C0RE2) , the 
SAS procedure PRO.C IMPUTE will be used to generate imputed 
values from initial ASVAB test scores. • 

In all cases, the exact details of the machine 
pr^ocedures for error resolution will be refined using 



information from the outcome of the editing procedures. 



The editing of other FY81/82 data (Accession^ 
Applicant, EMF<, and SQT) will proceed in a similar fashion. 



\5fcer initial Ij^nkage, editing will proceed 
variable-by-variable, using the best available information 
to test or correct the data in each field. Copies of the 
appropriate Army Regulations will be obtain^ to aid in the 




' 3.2 Editing Other FY81/82 Data 



editing as well as the documentation of each fieid. 
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DATABASE STORAGE AND ACCESS PROCEDURES 



At the time that the proposal for this project was 

r 

developed, RAPID was identified as the most cost-effective 
data base management system (DBMS) that meets Project A 
needs. This decision Was based on three important features 
of RAPID- The first was the storage and acces^mSde 
employed by RAPID. RAPID uses a "transposed file'* 
organization, which means that it stores together all the 
information on a single variable rather than all of the 
information on a single ••case'* or ' respondent . It stores 
the data in a direct access ^le with appropriate indic^^s 
so that it can read^elected variables without having to\ 
read through the entire file. The standard statistical 
packages, in contrast, employ a sequential access mode and 
Store .data by case* Even when only a few cases and 
variables are required, the entire file must be read in 
order to select the desired information. Most other common 
DBMSs (Jo use direct access files, but still store 
information by case so that they only add additional 
overhead in accessing selected variables. 
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The second important feature of RAPID is that it 
provides fof a significant degree of data • compression . 
This means that it will be feasible t^sj^ore much more of 
the data on mass storage units, greatly increasing the 
speed with V^hich these data can be retrieved In comparison 
to tape storage! 

The final advantage of RAPID Is that it provides* 
conveni'^nt^in^rer faces with both and spSS (as well as 

other) statistical packages. This facilitates the creation 
of special analysis files and the use pf SAS to manipulate 
data to be loaded Into the data base. 

4.2. Anticipa ted File Structure ^ 

RAPID is a ••relational" data base system. it 
processes a series of " rel actions" which may be viewed as 
data ^tables where the columrts are different variables and 
the rows are different observations. Each row is 
"identified'* by one or more columns which provide the keys 
for ac'cessing the %nfo rmat Ion in the table. Each row must ^ 
have a unique combination of key values. 

Relations' -are normal 1 zed If they contain no 
* redundant** information. This frequently means creating 
several^subf lies with different fields. In the FY81/82 
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training file, for example, the MOS, school, course, and 
class information are constant for, all soldiers in the same 
class. A more efficient storage of information would 
result from maintaining course and class information in a j 
separate relation (filej with only one entry (row) for each 
course and, class and then keeping only an index to this 



and reductions in processing costs, when only the smaller 
file(s) need to be accessed, and t^he corresponding increase 



in processing costs, #hen it is necessary to join 
information separated into different files. At present, we 
can only forecast requirements approximately so an exact 
optimization is not possible. Duripg the^course of the 
project, statistics on actual access requirements will be 
used to reevaluate our file and subfile design. 

4 

The organization of data into files currently planned 
is given below along with some discussion of the rationale 
behind the proposed organization. Table 1 summarizes the 
different file types that are planned and gives a 
three-character designator for each type that will be used 



* 

information on the individual soldier records. 




J 



as a prefix in the file name. 
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TABLE 1 
LRDB FILE DESIGNATORS 



PSF - Primary Soldier 
.record for ea.ch 
keyed by scrambled 



Fl les , one 
soldier in 



ID 



file for each cohort, one 
the corresponding cohort. 



APF - Applicant File, one file for each cohort, ^ 
tor each application not leading to accessi 
by scrambled ID and application number. 



one 
on' 



record 
keyed 



SSF - 



Sample Soldier Files ", a separate file for each of the 
,MOS selected for special data collection (FY83/84) 
and Fy86/87 cohorts only) ,' keyed by scrambled ID 
within file. 



~ Soldier Progress File, one file for each cohort, one 
record for each EMF record pulled (tentatively 4 EMF 
records per year) for each individual in the 
corresponding Primary Soldier File, keyed by 
scrambled ID. and month of ertlistment. 



FTF 



Field Tes t Files, one file for each field test event, 
one record for each soldier tested, keyed by 
scrambled ID. 



MOS 



MPS Files , one file, one record for each MOS, keyed 
by MOS. 



TSK 



MOS/TASK Files, one file, one record per MOS and 
Task, keyed by MOS and task code. 
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A. 2. I. PRIMARY SOLDIER FILES 

There will be three primary soldier files, one for each 
of the three main cohorts. These files will contain all of 
the ••constant** information on each accession in the cohort 
(i.e., each jjicqession during the period that defines the 
cifliort). This information will Include all information 
from the current accession record, information on the 
completion of training, and information on reenlistment 
decisions. This file will be keyed by soldier identifier 
(scrambled SSN) . 

An abbreviated primary soldier file will be maintained 
for each of the gap periods between the three main cohorts. 
These files, which will contain only accession information, 
will be of primary use to Project B in the development of 
forecasting models. ^ 

4,2,2. APPLICANT FILES 

A separate applicant file will be maintained for the 
accession period corresponding to each of the threa 
cohorts. This file will be keyed by the same scrambled 
identifier uSed in the. Primary Soldier File and by an 
occurrence number within each individual ID* There will be 
one record for each appl ication of each individual. In 
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order to avoid diipl ica t ion , only application information 

i 

/ 

not leading to an accession of interest will kept here. 
By concatenating these files with the Primary Soldier 
Piles, however, a complete set of application data can be 
obtained. r 

Each record will contain test scores and other 
information^ relatir\g to the particular application 
Including backg round /data that ought not to change from one 
application to another but might change anyway. These data 
will be useful In^ es tabl i shl ng overall base rates for ' 
applicants and for looking at the level of consistency in 
different variables aci^^ss applications. 

; ' " • 

4.2.3. SAMPLE SOLDIER FILES 

For- the FY83/84 and FY86/87 cohorts, there will be 
Sample Soldier Fil^ consisting of all of the-^sold ier s 
sampled for special data collection. Currently, we plan to 
maintain 19 separate files corresponding to the 19 
different MOS sampled for new data collection. This will 
facilitate the creation of separate analysis files for each 
MOS. It Hill, of course, be a simple matter to concatenate 
these files for across-MOS analyses. 
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The Sample Soldier Files will also be keyed to an 
alternate identifier defined as the index number of the 
corresponding Primary Soldier File record* They will not. 
contain any other variables stored in the Primary Soldier 
Files, but they will be directly linkable to the Primary 
Soldier Fileg by this index number (without further 
sorting). These files wi 1 1 conta in "all of the new measures 
collected on each soldier in the selected samples. Some of 
the measures- collected will vary from one MOB to the^p^xt, 
particularly the MOS performance measures and the job 
knowledge and >iands-on measures collected during training. 
(This is a major reason for maintaining separate files by 
MOS,) It is likely that these files will also be further 
divided by data collection period. The FY83/84 second-tour 
sample, for example, will be only a subset of the 
concurrent validation samples, and the concurrent 
•validation samples will also be d i f f erent f rom the samples 
receiving v|:he Preliminary Predictor Battery. 

4>2.4> SX3 LDIER PROGRESS FILES 

Separate Soldier Prog r ess files will be used to store 
recurring informat:ion on each sold ier- s ^progress in the 
Army. There will be separate Soldier Progress files for 
each cohort. These files will be keyed by soldier ID aad a 

_» 
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generated variable TOURMON which gives the number of months 

slnce the beg,inning of active service. The contents of 

these files will come, pr imar ily from the EMF, which will be 
- ' - * • 

accessed at regular intervals, and. from special SQT files. 
The primary purpose of these files is to provide the basis' 
far time secies or career trajectory analyses in which each 

1 

soldier's progress is charted as a function of time in 
service. V 

4.2.5. FIELD TE ST FILES . 

y 

A separate datafile will be created for each field 
test of each new instrument or battery. These files will 
be keyed by alternate identifier (index number in the 
relevant Primary Sdldier File) so as to be readily linkable 
to all other information o^J the same soldiers. The 
contents 6f each'-file will be highly specific to"Y^^ 
related field 'test. * 



4.2.6. MPS FILES 




A separate file will be maintained which will contain 
information on the character istics, of each MOS. This file 
will be keyed by the three-jg^rac ter MOS code. The 
specific contents of the file arre not fully known at this 
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time* Some information on qualifications for each MOS, 
workforce size and requirement forecasts, training . 
location(s) , and utility measure data collection will be 
included. 

4.2.7 TASK FILES 



Information on specific tasks performed within each 
MOS is ^available from several sources (e^g., the Army 
Occupational Survey Prog^m^ the Sold ier • ;3. Manual , the RCA 
study of prerequisite competencies using TRADOC sources). 
In developing both training and MOS-specifi^ performance 
measures, it will be desirable to maintain a file of these 
tasks for at least the MOS selected for special data 
collection. 



4.3. Updatin g Procedures 

Formal updating of the LRDB will be carefully 
cdhTfoTTe^Tjy^'^TT^^rTO — rc^-l^s-^ng^BTTtia^l^-^tTa t t 



be an orderly process to protect the integrity of the data 
b*se. Consequently," the procedures for modifying th^ LRDB 
will be made available only to the data base administrator 
and; to the'ARI data base morihitor. Other requests to use 
these procedures* for creating/updating otiier (non-LRDB) 
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files will be e v a 1; ua t ed .o n tn e r 1 1 a nd g r anlg d_g.Qly„^„^^^ 
approval of the ARI monitor and the data base 
administrator. 



In many instances, file updates will involved adding 

derived variables or indices. In the course of analyses, a 

> 

large number of such variables will be added to woMcfilejs. 
Where the general applicability of such variables is judged 
to warrent the increase in storage space, thdse variables 
will be^ added to th^ master data base . 



The process of adding new data to the file will 
involve several steps. These steps are designed to 
minimize the need for further change^ or corrections once 
the- data beCbifie available . Insb far as possible, such 
changes will be str ictlj|f avoided so as to eliminate the 
jnL§M_lQLl„.Xj9l»-Uimin,g™s±g.niXix^An '^na±yswEr~r:u — 



reflect corrected data. The steps to be followed in 
updating the file include the following^ 



4,3>1. IDENTIFICATION AND ACQUISITION OF NEW DATA AND ' 
RE LATED D^UMENT ATION 

In the case of the acquisition of existing data, this 
step' vHll'B'e^rerat simple. Fot new measures to be 

collected, however, data base st'af f \will expect to play , 
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■A„jD£Li:.ft-.>ajL.9Jiif tcnnt ■JiQiA.J.jQL-.Lhja„..deL3ijgjQ.-jQ.f „^ nX^s^.^.!! t 

collection ingtrutnents to facilitate cjata entry. 

for data that are not now In machi*ne- readable, form, 
the Da staff will provide fc^r data entry. Plans call for 
the use of the inter^ictive entry/edit system (FORMSPEC) 
available at AIR's Washington Office. If similar software 
can be installed at NIH, we will switch to eh'terlng data 
directly Into NIH. 



4. 3. • l inki:n(^ in related^qata _ : 

... ... .JH^ . .-if - - - - - . .... 

A separate Link tile will be maintained to facilitate 
the addition of new data. This file will contain basic 
iderit i f ying infdrroat ion f name , ' b i r thd'ate , pr imary MOS , 

race^ sex) and pointers to ( iTKJex numbers) records in^ach 

be passed against th6 Link file. For each matching record, 
all identifying information will be deleted and replaced 




with the appropriate pointers. For initial nonmatches, a 
second attempt will be made to match to the Link File on 
thife basis ot secondafy id'enti fibers , inqiuding, if^ 

cessary, manual inspection of " close"^ ma tches . For many' 

not already in the Link file. Where it is. d^^sired that 
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the Link file and the data will be retained. This Vll 1 be 
^^^he case only if a new relation is being established so 
that.it shouid cause no problem with the index values 
stored in the Link file. » 

Once all links to. existing data have been established, 
the existing data needed for editing will be pulled out of 

J^^-^^ ^-'^^ with- t^ ne^^r 4^aset'. ^t is * \ 

expected that this merge will accomplished using SAS, 
since, the edit procedures are designed as a SAS 
appl ifca't ion . 



4,->v 3 . - BE>rTING - ^ ' ' 7 

~ . . . 



The editing procedures are described in detail in 



""^"SecfcTon 1. "They will typically involve two passes. in the 

_ ' . ' , ... 

fitst pass, specif ica,tj,or»s.\Q_^^^^ 

. ^ L^_V^^-1-1 Jb e_dj8]^^pfi4. -afKL-i mi>4^»«fvbe4:v — ft^trB-r~t7r§^e cTt ng" v --"^ 

,J;he results of, this editing pass, error resolutfon ^. / 
specif icatiqn^ will be developed arid Implemented as a ' 
second pass. ' . , 
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4.^.4 DOCUMENTATION 



Following the completion- of the editinc| process, the 
documentation of the new data will be accomplished, A 
central part of this activity will be establishing the 
codebooks including frequencies and^ descr ipt i ve statistics 
as described In Section 5 of this plan* 

4>3.V MEPGING THE. NEW OATA 

After the documentation has been completed^ reviewed, 
and revised as necessary, the new data will be formally 
merged into the DBMS and appropriate backup tapes will be 
created (using the RAPID UNLOAD procedure) . 

The final step in the addition of new information to 
..t.h.e^.LRDB will be to inform potent tai-ni^T^r^Hf'^^i^t^^^^^^^^^^^^ 
availability of the data and the documentation for the ^ 
data^ This Will be • accompl ^shed through the eleci^tronic 
bulletin board implemented as part of thfj pro j feet sigh- on 
procedures and through mailing to a list of ARJ and project 



StaTE designated to receive information on the data b^ise. 
This, mailing vlist will be -establ ish^d and reviewed by the- 
Project Director and Principal Investigator and by senioV 



\ 



74 

64 



ARI staff assigned responsibility foe monitoring this 

m 

activity. - ^ ' 

4.3.7^ EXCEPTIONS 

- — — — - — ^ 

# It is expected* that there will necessarily be 
exceptions to this orderly process. The most common form' 
q£. fixceptixxn is when €(u4Gk^n*ly«hes are recfuired even 
though the data have not been completely edited. In most 
cases such analyses can proceed with the completion of step 
2 (linking) and run in parallel with the editing. In a few 
cases. It may be necessary to strip identifiers and proceed 
with a copy of the input data only. In any event, the 
establishment of an orderly process makes the exceptions 
clearer. if preliminary analyses are run, they will be"""^ 
designated as such and checked as needed once the full 
update p^rocess has bel^nT^compl-e ted . „ 



4»4, Access 

^ Primary access to the Lrdb will be through the \SAS 
interface procedure, PROC RAPRD. The Task 1 staff are all 
experienced SAS users and jplan to conduct most of the 
analytic investigations u^ing the SAS package. SAS is also 
the package of choice because of its capacity for merging 




and transforming d^ita easily. We intend to further 
simplify access to the data by creating a WYLBUR Command 
Procedure that will take a file name and variable list and 
create the SAS set up to read the requested variables into 
SAS and attach all of the appropriate variable labels and 
foriftats (for value labelling). This procedure will greatly 
simplify authorized access to the data base and will also 
contain a log file to monitor such accesses., (As discussed 
below, in Section 6, we will also place a logging procedure 
within the catalogued procedure that accesses the data 
base^ as a further control on access.) 

After trhe first portions the data base are'^loaded, 

we will conduct a gmall. jco^5^r--^Kv*iysi:^ t^^ the 

relative efficiency of using RAPID operations to join 

Ik 

information stored in different relations in comparison to 
using SAS merge operations to accomplish the same 
objective. This issue is not o^lnajor concern ^nce SKS 
will accompl ish this objective with reasonat^e efficijency, 

but it is of interest in cases where access from other- 

■ ^ \ 

procedures is required or where very 1 ar<^ datasets are ' 
being created • 



In addition to providing access to the data base 
through SAS^ we will implement procedures for generating 
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SPSS systems files and also raw data f^iles. A TPL 
intereace Is available, and we will install It if any need 
• becomes apparent. 

In general it is expected that requests for analysis 
files will be channelled through either the Project's 
Database Coordinator or ARl's monitor for this activity 
(who may also then pass the request on to the Database 
Coordinator). This pattern is expected to act both as a • ^ 

me^ns^of assuring a close monitoring of access to the data 
V ^nd also to insure reasonable efficiency since the data 

I 

basb staff will be most knowl edgeablei^about the data base 
contents and access procedures. Except in very high 
priority cases, it is^expected that workfile creation runs 
will be created overnight during the discount period. With 
the WYLBUR command procedures in place, We expect that 
workfiles can be created with only a 24-hour turnaround in 
most cases. • , 

t 

■ ■ / ■ 

w 
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5. 



DATA DOCUMENTATION AND DISSEMINATION 



Documentation Formats 3nd Standards 



Because the project involves the simultaneous 
collection and analysis of many interrelated sets of data 



by different teams of researcher's, it demands particular 



effort in clear and complete documentation of the^data 
base* This effort is complicated by the fact that the data 
base will not be constant, but rather will grow throughout 
the project as new measures are developed and new data are 
collected. It is essential, therefore, that the system for 
data documentation be carefully developed and strictly 
enforced from the outset of the project. 

We will implement a multilevel system of interrelated 
data documentation dacuments that together will allow users 
easily to gain complete information on the data they may 
need to use: The key elements of this "metasystem** of 



documentation include ; 




o An Event File that doc urn en ts each data 



TT- 



collection ^ event 



ft 



o' An Instrument File that contains copies of 



the data collection instruments (including 
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bo th quest i onna i res/ tests and answer 
sheets where se pa rate; 

o A Sample Structure File giving a list of 

— — ~ — ~ ' --TT 

the different samples used in the data 
base and showi ng their relation ship in 
Venn Di ag ram type f o rma t ; 

o A Da taset Log that shows the name^ 

characteristics, and location of each data 
set in the data base and refers to the 
appropr ia te codebook documentation of the 
data set; 

o SAS Cod ebooks for each data set^ including 
frequency distributions for each discrete 
var iable and com^Tete summar y statistics 
for continuous variables; for derived 
variables^ the computational formula ViH' 
be indicated; for other variables^ the 
source' file will be shown; 

^ Vg^r iable Cross-Ref erence Fi le^ listing 
eac^i of the variables in the data base 
topically and by variable name and giving 
a list of all of the dat^a sets for which 
the variable is available; . 
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^ Pata History Documenta tjon for each data 
set, including an overall flowchart 
showing the steps and workfiles in the 
file creation/editing process and the 

1 

printed output from each stejp in this 
process. . , - ^ 

Each 6f these logs-.or files is described more fully 
below. We plan to use the WYLBUR text en t ry/ed i t i ng 
capabilities to maintain "on-lirre" versions of all but the 
instrument file (where only the contents and index will be ^ 
on-line). Hardcopy versions of each of these text files, 
as well as tTie instrument file, will also be maintained to 
facilitate the production and distribution of new copies 9JE- 
the 'complete documentation package for staff and others 
requiring such documentation. 

5.1.1. EVENT FILE 

The event file gives the basic "who^ what, when, why, 
where" of each data collection effort. Specifically, each* 
entry will include:' 

1. The date(s) arid place(s) of the data collection 

events (e.g^, FY81, at all MppS; or June 23, 1983, 
s ■ at Fort Knox) ; 



2. The sample(3) from whom data were obtained 
including the identifier used to access the Sample 
Stri^c^l^^ File; \ ■ ■ 

3. The instrument (sji;^ used (including the instrument 
"identifier" used in accessing tjie Instrument 
File); and 

4. A concise description of the purpose and. intended 
use of the data. (For data collected t!^ project 
staff, this will be a' summary of the justification 
statement developed prior to the data . col lection 
>nd- will refer to the more complete statement,)' 

5.1.2. INSTRUMENT FILE . : 




Re^earp^Kers occasionally need access to the original 
data collection instruments for such purposes >«s checking 
the actual wording of particular questions, checking 
potential skip patterns, and generating hypotheses 
•concerning odd'^ties in the responses. , In many systems of 
data documentation, codebooks are corxstrained by variable 
and option labelling that must fit the format of the 
particular system employed. As a result, th^ fup text of 
the question of the response alternatives is not available 
to the anai^t. in addition, the "context" of the question 
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Is not app^lrent In most codebooks. We will maintain a 
complete file of all of the Instruments used, organizec^ by 
an Instrument identifier and accessed by instrument name 
and by a topical index of the instruments. Security 
restrictions may apply to some test ihstruments (e.g., ^ 
A'SVAB f orms)^ • We will investigate ways of satisfying 
security concerns in such cases. 

The instrument file will .be maintained in hardcopy 
•form suitable for efficient copying on a Xerox 9400^as' 

copies are needed for pew project staff members or'other 

1 

researchers. 



5.1.3. SAMPLE STRUCTURE FILE 

Each time a new data set is received, the sample (and 
subsamples where appjjbpr ia te) on whi'ch the data are based 
will be identified apd assigned a sample identifier. This 
Identifier, together with a more complete labelling of the 
3amplel(s), will be entered Into the Sample Structure File 
that logs each sample and points to t\\e Relevant data- 
s*et(s). The degree of overlap with every other sample will 
be ascertained and recorded in a sample structure matrix. 
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5.1.4. DATASET LOGS 

^ach of the raw and SAS data files comprising the data 
base will be, listed in a data set log. This log will show 
all versions or generation^ of each data set beginning with 
tTve'lhitlaT tape(a) or^card(s) ^received from the field or 
from the data entry vendor. ' For each data set, the 
location (e.g.,ilIH tape 1 ibralry , NIH disk pack, backup 
facility t^^^^^^iwp.r y-) will be indi,c^ted alon^. wi^th the 
primary da-ta ,set characteristics (storage mode, blocl;; size,, 
anpl record size where appropriate)* and pointers to relevant 
entries iti the Sample and Jnstrument Files. Much of this 
information will be maintained in the operating system's 
on-linfe catalog for tlife current, operative version pf each 
file. V 



5.1.5, CQDEBQOKS 

— ■ — -r- 1 . — ■ — 

•m 

Detailed information on each variable in the data bas6 
will be provided in SAS codebooks. The codebooks witl be 
organized by dataset (relation) and data collection 
instrument With file and in^rument identifiers indicated 
oti the heading of each page. The specific contents of the 
codebooks w i ll include; , . ~ ' . 
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1, a var labile name and a more complete dei^cr i ption 



1 .p^ 
for pach variable; 



2. a summary of the characteristics of the variable 
(character or numeric, number of characters/number 
of decimal places) ; 

3. the number of cases with valid. responseLs^ and the 
number for which .the varl^able was'^omitt^d or 
missing;^ 

4. . a label for each response option for all discrete 
variables; and 

5. the agrtual frequency distributfbn for each, 
discrete variable and ^.appropr i^a te summary 

statistics (mean, standard deviation, median, 

• * 
quart^ile points, minimum and maximum) for each 

continuous variable. ^ 
5.1. 6. '^'^^j ^ABLE CROSS-REFERENCE FILE 



: The Vaifiable. CrojSS^Ref erence Fila will Gont'ain an * 
alphabetic and a topical listing of all of the variables in 
the entire ^ata base. For each variable, the appropriate 
instrumentSv data sets, and samples vrill be indicated The 
topical index will be of parti<;^ular, imjjprtance in providing 
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researchers a means of getting an efficient overview of the 
"system and determining the availability of data to meet 
specific needs. During the planning phase, an iaitial 
variable taxonomy will be developed, and this taxonomy will 

durj ng. the -prjojact a pp^^-op^-ia t-e-: trr- ~ V" 

developing this -taxonomy, multiple listings for variables 
.will be assirmed (e.g., initial ASVA&" scores might be listed 
under "Accession Data," under "Aptitude Measur es , " ' and also 
under "Performance Predictors"). * , ' 

5.^.7, DATA HISTORY DOCUMENTATION 

Data history documentation will make it possible to 

examine each step in the creation and editing of the final 

datasets. This history documentation of each Gataset will 

consist oi^-a flowcharii showing the files and programs used 

at each step in the creation/ edi'tin^ process and the^output 

V - *■ 

from each computer run in this process. The output will 

show JDoth all of the program sta^iements used "and an/ 

printed results, (e.g., record counts of warning messages). 

This documentation will- be ma^nt-ained on-line while the 

datasets are active. • ' 
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5.2. Dissemination 



Several d i f f eren^rajithods will be employed to make 



ita oa 



i n fo rma t i on on the data oa-se available to appropriate 

iadividuM-s-- - news -o.f immerht^ts" 1(m*p6 r tahce 'wTl 1 be p^^^ aced 

in an on-line electronic bulletin board with headlines 
announced through each user's ^logon profile. Similar 
on-line aids/ accessible from Prp'ject a(^Q.unJ||^s , • wl 1 1 be 
us^d to poin^^to-the WY^^BUR versions of the data 
pocumen tat ion described in Section 5.1. ' 

As the data^ from each new d^a col 1 ec t ion become 

available, an informal workshop will be held. Printed 

copies of the documentation will be distributed to 

« 

authorized users, and spec ial* char acter i-st ics of the data 
will be discussed. Rn initial workshop held in May 1983, 
cov«»d"d^ta storage and access procedures and information 
on RAPI^, WYL^UR, and "NIH computer facilities in general, 
as well as thp detailed contents "-of the FY81/82 cohort 



/ 

files. 



6. DATABASE SECURITY 

i 

6.1. 'Xhe Need Cor Security 



Whenever a large amount of data on individuals is 4 

maintained and stored, it is necessary to develop 

procedures to protect that data from compromise. The 

security of the Project A and B LRDQ is particularly 

important for a number of reasons. Some of the data 
♦* ■ 

collected on individual soldiers^ such as promotions, 
paygrade, or disciplinary actions, will be private^ in 
nature, and the privacy of that ' informat ion must be 
maintained* Since many resea rcher s Vl 1 1 be accessing the" 
LRDB for a variety of uses»^ Che integrity of the data must, 
be maintained to insure that the data remain accurate and 
consistent across uses. - Finally, it is necessary to secure 
the data base, .to__iJl^s^ure--tJ^5^t the -Arrm^ ownership 
of'^^the data* In other wordSj^^>^to insure that the data 
within the LRDB are used only for authorized Pr^oject A and 
B research - 
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§.?,^.y. Li P r_o c e d u r e 



'^he sex'rurlty of the LRDB will be protected in \ number 
offways. Soldier 'social security numbers (SSN) will be * 



IJoutihgly encr ypH^cC t;p insure t\\e privacy of each soldier's 
^r<?cotffds.^ Acces^to the LRDB will be controlled both to 




fur th*er*protect* soldier pri\/"acy and to insure proper use of 
^t4ne dat^., To'^provide further physical security, a log will, 
be maintained for the LRDB system that will note each 
attempted access of^the LRDB and whether the access was 
authoeizfTd or not. Finajlyr a set of data processing 

practices will be established to provide, security fot the 

ft 

information managing aspects of data within the LRDB. Each 
of these procedures ^wlll. be detailed in the subsections. 
tha t f o i low . ■ ' 

6.2-1. SSN ENCRYPTION 

— ^ _ . \ 

V 

Thp key aspect to guaranteeing the privacy of 
individual soldier data wiJLl be the coding or encrypting of 
each*, soldier ' s identifier.. This encryption will be 
accomplished by scrambling each soldiers' SSN in an 



u n p r e cJ T c t a b 1 e " >r a y f h a 1 g o r i t h m t; h a t will do the 

encrypting " (and If needed, decryptirrg) will be known onf^ 
to the LRPB manager and ARI i n~house d^ta base 

» # .J 
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a<^ministrator s. A printed copy ot the aUjorithm will be 
securely maintained by the Project A COTFU All of the datd 
files of the LRDB that can be* routinely accessed and any 



project workfiles generated from the larger LRDB files ^ill 
use only this encrypted SSN as the soldter identifier. 

^■^•l'... (Controlled FILE access 

The integrity and accuracy of the LRDB daba will be- 
maintained^ by controlling the access to the large files or ^ 
.relations within the data base.. This prcrcedure will also 
further contribute to the pr i vac^protect ion of individual 
soldier record'^ In general, the system to be adopted, will 
use the-'RACF proce'dure available at NIH to allow the access 
of particular files to authorized users. Under RACF, 
different levels of access can be granted to different 
users. By specifying a "universal access" of "NONE," 
access can be restricted to only those users granted 
specific exceptions. In most cases, such users will be 
allowed "READ" access only. Such users will have to 

provide an eight-character RACF password (different for 

^ _^ 

eacTi "us^irT In order" t^^^^ datafiles for which they ' " ' 

have been g iven , access . Using the provisions of RACF, a 
series of access "levels" will be developed which should **' 
P'^'o^l^de'Timfely accQS^ to relevant data needed by Project A ' 



I 



and B researchers ancJ yet protect. t:he security and . 
Integrity ofthed^ta. 

— "bev-el— — At the-~-W-t^e84"--leV"e-l" -o-t— aeees-ti--wi4"l- - tKe- 

data base administrators. Currently t hese M nd i v idua 1 s a r e 

Dr . Laur ess Wise and MS . Winnie Young - o f M\\ and Dr • Paul 

Rossmeissl^ and Ms. Frances Grafton of ARI., Level 1 

personnel will have access to all of the i^iles and 

relatiqns within the data iDase. Furthermore, only Level 1 

personnel will be able to enter data into the data base or 

\nodify data already stored in the data base. Thus, the 
> 

data base administrators must assume responsibility* fot" 
data entry, editing, and the storage of original data 
materi^Tis (IVe. , "tapes, punched cards)"*^ in a secure 
location. In addition, it will be the duty of the , Level 1 
personnel to create Level 4 workfiles as they are needed by 
other project researchers. ^ 

Level 2. Personnel at the sec9nd access level wijl be 
able to directly read data from all of. the files in the 
data- base with the -exception of the Link File (see Section 

4.3.2), which will contai^p basic soldier ident^fj^Lng 

information. This exception is made to maintain soldier 
privacy.. It is planned that two membej^s of the Project A 
staff will have Level 1 aocess to the I/rdb. Dr. Ming-mei 
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Wa ng , the d e pu t y Task 1 1 e a d e t for 15 tali s t i c: a 1 a \ y s e s , 
will need to bo able to quickly access all data tiles, 
since the validation analyses ot Task 1 spar^ all' tasks and 



data sets collected within Project A. Dr. Lawrence Hanser/ 
the Task 4 monitor, will also be provided with Level 2 
access. Dr. Hanser has been associated with Projects A and 
B since, their inception and will backup the ARI in-house 
Level 1 LRDB 'staff in insuring that ARI ha\s complete access 
to the LRDB for in-house research. 

Level 3, Most project personnel will have some Level 
3 LRDB access. Researchers at this level will have direct 
access to all files that are generated by the particular 
tasks tijey are investigating'. Furthermore, they w^ 1 1 have 
direct access to the files created by other tasks that 
d irectly- impact their work. For example^ Task 2 
researchers will have direct access- to the task analysis 
•data collected by Task 5 so that the new predictors that 
are developed will address area^ of^the criterion space not 
currently covi^red by ASVAB^ . 

Level 4. The most common way in which prjQj_eLCi:. 

researchers will access the LRDB is through the creation^qf 
workfiles (see Section 4.4.). By requesting the creation 
of a work file, a researcher will be able to obtain data 
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from all of the large'files in the data base except the 
Link File which will contain soldier identifying 
ln£or|nation and will always be kept private and secure. 



~TT\e "key aspect of workfiles relevant to LRDB security is 
that the researcher will only receive the data that he or 
she requested and there will be a precise record ^of who 
requested what. data. When a project scientist requires a 
workfile, he or she will submib a data request form to 
either the contractor or ARI data base administrators. 
This request form will ask: 

(1) Who wa/jts the data? 

(2) What variables are needed? . • " 

(3) What Sample is needed? 

(4) Which LRDB file or files contain the data being 
^ requested? 

. (5) Why are the data needed? 

(6) Will the data be downloaded to hardware other 
.tjhan the NIH computer f aci 1 i ty^ 



(7) What will be done with the data after its current 
^ilse is completed (i.e., fil^ will be Scratched jfr 
saved for future ""use) ? 
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In addition, each data request form will remind the 
researcher seekinci data that LRDB is . the property of the 
Army to be used only for Project A and B research and that 

publicatiojis, papers, and briefing charts based upon 
these data must be submitted to ARI for clearance before 
they are presented to the public. 

) Paper copies of the workfile request will be available 

all Project A and B scientists, but it is expected that 

most researchers will make use of a request form that will 
be stored on-line at NIH and can be quickly sent to a data 
base administrators using WYLBUR electronic mail, it is 
expected that most>work file requests will be filled using 
overnight runs at NI;h (see Section 4.4.) and should be 
ready within twenty-four hours of the original request for 
data. 

6.2.3. LRDB LOG 

The procedure used to execute the RAPID data base 
management systems' data retrieval programs has been 
modified to log a record of e'a c1i~ a e ss" a ttempted ace eg g- 
btr~H^-daLa -fcra^HT — rh-t-5~arcT! fe g g log"'Xn e^wITr'FeTe vre"we d 
"weekly to assure that no inappropriate access has been 

t 

attempted. in addition, the monthly accounting information > 
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of each project user will be monitored tor any iruiicat.ion 
of unaut tio r i zed access to the LRDB. These audit trails 
will serve as a second level of protection against 
unauthorized use of the data by anyoru^ who manacjes to 
obtain the necessary RACF passwords. They will not 
, directly prevent unauthorized access to the LRDR, but the 
threat of exposure should serve as a siqnificant deterrent 
to attempts at unautho r i zed , LRDB access. The log will also 
help the data base administrators decide which projects 
files should be stored on disk rather than tape by 

/ 

providing information as to how frequently data are 
requested from any given file. 



6.2.4. OTHER PHYSICAL SECURITY PROCEDURES . 

, ' Mii^ of' the data that will be entered int;o the LRDB 

^will come from existing Army sources, such as the EMF. 
Additional precautions beyond those mentioned above will be 
taken to secure the inforrtiation on these data tapes. The 
tcey aspect of this additional security is to collect and 
store information from these sources only -if 'it is 
essential to the goals of Project A aad B. For example, 
with regard to the EMF, this LRDB plan indicates 
specifically which, variables will be needed. Other 
variables, in particular, earth soldier's location and*' 




assigned tin I t , will not be acquired in any form. In 
addition to limltinq the data eleMuents t,o be stored^ the 
number of soldiers fot^^^hom any data will be retained will 
be limited. As indicated in Section 3 above, the LRDB will 
not obtain and keep information on all active service 
personnel. Only data from personnel . se 1 oc ted ^ f o r Project A 
and B research will be maintained. 

6,2-5. DATA BASK ENTRY AND KDIT I NG^^S U R I T Y P R OC E DU R |^ S 

In addition to providing for the physical security of 
the data base, procedures have been established to maintain 
soldier privacy within the areas of data base information 
management. Included withip the broad topic of information 
management are such specific areas as handling of ravNi data, 
maintenance of raw data forms, and procedures for dealing 
with processed data (such as printout}? or writt«*rft reports) . 
Thi^s section presents the procedures that, will be used to 
provide security during data entry and editing, whild the 
following section presents procedures that will be followed 
to protect soldier privacy in the anal ys is and reporting of 
data. 

Data entry . All f o rms -f o r ,da ta collected in the field 
will be shipped to the data entry station at AIR in 
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sequenttrtlly numbered packages via cerLifled mail. Within 
24 hours of their receipt, an entry it\ a dat.a entry I09 
wilTbe noted for each packa9e. This Icq will be 
maintained on-line, but will be*backed up by a h.jrdco[^y 
followlncj each log update. The log w.M 1 contain ' 
identification of each packaqo roce'lived, the number and 
type ol the documents i nc 1 ud ed c'»t\d thti current ytatur, 
(ent-ry preparation, entry, ver i I i cat ion , editing, or 
shredded) of the data. , •. 

Data editincf. While the data is beinq edited, the 
data collection forms (the ra.w data) will be stored in a 
locked room at a site removed from any post where the ■ 
individual responses should not be of interest to anyo!!»©. 

V 

Data integrity of the new data will be' insured through 
thorough editingpf the data. This ed'i t i ng' wi 1 1 include- 

'An * 

complete verification of all entered data, a reconciliation 
of the resultant record coun^^^s against the initial document 
counts, and relational editing of all new data to 
appropriate existing sources (e.'g., the SSNs and b\r^h - 
dates match the master link\ile). -Once the data-have been 
e^nte'red into the LRDB" ^nd Completely edite(^, th« AIR data 
base administrator wiU review the . oompleteness of th^ data 
entry/editing process. In performing this review^ he will " 
consult with the task leade'f* and ARI task monitor » ' " 
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rtvjpon.';lble for the data co 1 UkM. i on .^Fo U owi ikj any turt.her 
rt>vir>ions resulting from this review and a final approval 
of the editinq, a back-up copy of ftie fht^ resultinq 
datatiles will be created tliat docs not^ contain any 
personnel identifier other than the encrypted SSN. This 
tape will be removed from the NIH facilities and stored at 
a separate location. 

I 

oYroe the data have been backed up on,^o tape, all of 
the input documents will be shredded.' A final count' wi 11 
be made of the number of documents shredded and thi5 count 
will be checked against the initial document counts and the 
data entry log. At the same time that the data input forms 
are destroyed, all printouts generated during the editing 
of the data wi.ll be reviewed. j Edit Run summaries and other 
general information will be f o und , tog e the r to form tlrc . 
detailed documents of the editing process. Any other 
printouts, including any with potentially identifying 
information, will be destroyed along with the input 
documents. Likewise, any computer workfiles containing 
possible identifying information (excluding the master link 
file), along with all summary files npt needed as backup or 
documentation, will be deleted from the system and then 
Overwritten. The ARI data base administrator (or someone 
he delegates) wi l^L oversee this entire process. 




6:2.6. DATA ANALYSES AND F^t'.PORTlNG SHCUIUTY PHOCMODUHlilS 



pr 



All workliles, printouts^ and analyses pr (Minced by the 
oject data base personnel will contain a header 



indlcatinq that the produots were basecJ on petsoiniel data 
that the oduc t sho ul d t here tore be hand led i n an 
appropriate manner. When researchers are finished with hny 
data, they will be required to specliy the disposition at 
all workfiles and computer printouts that were created 
during the analyses. If work is of a confuinincj nature, a 
list of the workfiles and printouts will be retained tor 
verification at the final completion of the analyses. When 
all analyses are completed, the ARI reviewer (the data basi:^ 
administrator or the appropriate task monitor) will approve 
the contents of. any workfiles or printed documents that are 
to be retained. The primary purpose of this review is to 
assure ^at no information that might- be used to infer 
individual soldier identities is retained. 

All reports, journal articles, and conference papers, 
based on Project A and Project B research must be cleared 
by ARI before publication. This clearance .process is ^ 
primarily concerned with the political and scientific 
sensitivity of the research and typically i^; composed of 
three levels of clearance (team chief or task monitor, tech 
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area, and research laboratory). In the case ol vrepotts 
baj^od on LRDB data, these revit^ws will be expand(>d to , 
assure that intormation Is not included in the reports that 
might, eventually be used to a^^u^ertain th(^ identil. y of 
i nd i V I d ua 1 so Ld i e r s . 



ecironically needs .-i.o balar^ce the ease with which data 



can be accessed ac^ainst the security ot the data base. The 
procedures presented In this section tend to tavor the 
security aspect of this balance. The number ol data tiles 
that most project researchers will be able to access 



d i rec tl y. wi 1 1 be quite limited. Furthermore, only the data 
base administrators will be able to add or modify data, and 
access the true soldier identifying information. However, 
efficient use and rapid creation of the workfiies should 
provide any project scientist with the data that he or she 
needs to perform the required' research. 



Sumina r y 



Any set ot procedures designed to store data 
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